Search CORE

137 research outputs found

Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo

Author: Fahrmeir L.
Opgen-Rhein R.
Strimmer K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Background: Coalescent theory is a general framework to model genetic variation in a population. Specifically, it allows inference about population parameters from sampled DNA sequences. However, most currently employed variants of coalescent theory only consider very simple demographic scenarios of population size changes, such as exponential growth. Results: Here we develop a coalescent approach that allows Bayesian non-parametric estimation of the demographic history using genealogies reconstructed from sampled DNA sequences. In this framework inference and model selection is done using reversible jump Markov chain Monte Carlo (MCMC). This method is computationally efficient and overcomes the limitations of related non-parametric approaches such as the skyline plot. We validate the approach using simulated data. Subsequently, we reanalyze HIV-1 sequence data from Central Africa and Hepatitis C virus (HCV) data from Egypt. Conclusions: The new method provides a Bayesian procedure for non-parametric estimation of the demographic history. By construction it additionally provides confidence limits and may be used jointly with other MCMC-based coalescent approaches

Springer - Publisher Connector

Open Access LMU

PubMed Central

The University of Manchester - Institutional Repository

Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process

Author: A Moneta
B Efron
B Efron
CA Sims
CWJ Granger
G Golub
H Lütkepohl
J Schäfer
J Schäfer
J Whittaker
Korbinian Strimmer
N Meinshausen
R Opgen-Rhein
R Opgen-Rhein
R Opgen-Rhein
R Tibshirani
Rainer Opgen-Rhein
S Demiralp
S Ni
S Wichert
SD Bay
SM Smith
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: Causal networks based on the vector autoregressive (VAR) process are a promising statistical tool for modeling regulatory interactions in a cell. However, learning these networks is challenging due to the low sample size and high dimensionality of genomic data. Results: We present a novel and highly efficient approach to estimate a VAR network. This proceeds in two steps: (i) improved estimation of VAR regression coefficients using an analytic shrinkage approach, and (ii) subsequent model selection by testing the associated partial correlations. In simulations this approach outperformed for small sample size all other considered approaches in terms of true discovery rate (number of correctly identified edges relative to the significant edges). Moreover, the analysis of expression time series data from Arabidopsis thaliana resulted in a biologically sensible network. Conclusion: Statistical learning of large-scale VAR causal models can be done efficiently by the proposed procedure, even in the difficult data situations prevalent in genomics and proteomics. Availability: The method is implemented in R code that is available from the authors on request

Crossref

Springer - Publisher Connector

PubMed Central

Open Access LMU

The University of Manchester - Institutional Repository

Gene network reconstruction from microarray data

Author: AV Werhli
B Efron
Florence Jaffrezic
Gwenola Tosser-Klopp
J Hausser
J Schäfer
J Whittaker
R Opgen-Rhein
W Swinkels
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Often, software available for biological pathways reconstruction rely on literature search to find links between genes. The aim of this study is to reconstruct gene networks from microarray data, using Graphical Gaussian models. Results The <it>GeneNet </it>R package was applied to the Eadgene chicken infection data set. No significant edges were found for the list of differentially expressed genes between conditions MM8 and MA8. On the other hand, a large number of significant edges were found among 85 differentially expressed genes between conditions MM8 and MM24. Conclusion Many edges were inferred from the microarray data. Most of them could, however, not be validated using other pathway reconstruction software. This was partly due to the fact that a quite large proportion of the differentially expressed genes were not annotated. Further biological validation is therefore needed for these networks, using for example in vitro invalidation of genes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ProdInra

Probabilistic modeling and machine learning in structural and systems biology

Author: A Bertoni
A Vehtari
D Yoon
Esko Ukkonen
Juho Rousu
M Oja
N Landwehr
P Geurts
R Opgen-Rhein
S Rogers
Samuel Kaski
T Michoel
TI Dix
V Roth
Publication venue: BioMed Central
Publication date: 03/05/2007
Field of study

This supplement contains extended versions of a selected subset of papers presented at the workshop PMSB 2007, Probabilistic Modeling and Machine Learning in Structural and Systems Biology, Tuusula, Finland, from June 17 to 18, 2006

Crossref

Springer - Publisher Connector

PubMed Central

Identifying Modules of Coexpressed Transcript Units and Their Organization of Saccharopolyspora erythraea from Time Series Gene Expression Profiles

Author: A Martinez-Antonio
B Zhang
C Chng
C Peano
Dov Joseph Stekel
E Ravasz
E Segal
F Mao
H Yu
J Schafer
J Supper
JD Storey
JT Leek
KJ Weissman
M Oliynyk
MM Zhu
MN Price
P Dam
P Langfelder
R Opgen-Rhein
R Opgen-Rhein
RW Brouwer
SC Madeira
Shuai Liu
SJ Kiddle
T Barrett
V Emilsson
VA Mironov
WI Mentzen
Xiao Chang
YC Tai
Yi-Xue Li
Yong-Tao Yu
Yuan-Yuan Li
Publication venue: Public Library of Science
Publication date: 12/08/2010
Field of study

BACKGROUND: The Saccharopolyspora erythraea genome sequence was released in 2007. In order to look at the gene regulations at whole transcriptome level, an expression microarray was specifically designed on the S. erythraea strain NRRL 2338 genome sequence. Based on these data, we set out to investigate the potential transcriptional regulatory networks and their organization. METHODOLOGY/PRINCIPAL FINDINGS: In view of the hierarchical structure of bacterial transcriptional regulation, we constructed a hierarchical coexpression network at whole transcriptome level. A total of 27 modules were identified from 1255 differentially expressed transcript units (TUs) across time course, which were further classified in to four groups. Functional enrichment analysis indicated the biological significance of our hierarchical network. It was indicated that primary metabolism is activated in the first rapid growth phase (phase A), and secondary metabolism is induced when the growth is slowed down (phase B). Among the 27 modules, two are highly correlated to erythromycin production. One contains all genes in the erythromycin-biosynthetic (ery) gene cluster and the other seems to be associated with erythromycin production by sharing common intermediate metabolites. Non-concomitant correlation between production and expression regulation was observed. Especially, by calculating the partial correlation coefficients and building the network based on Gaussian graphical model, intrinsic associations between modules were found, and the association between those two erythromycin production-correlated modules was included as expected. CONCLUSIONS: This work created a hierarchical model clustering transcriptome data into coordinated modules, and modules into groups across the time course, giving insight into the concerted transcriptional regulations especially the regulation corresponding to erythromycin production of S. erythraea. This strategy may be extendable to studies on other prokaryotic microorganisms

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

NetDiff – Bayesian model selection for differential gene regulatory network inference

Author: AA Margolin
J Cooper-Knock
J Grau
J West
K Strimmer
M DeJesus-Hernandez
M Kanehisa
M Kanehisa
N Dâ Ambrosi
N Krämer
P Langfelder
PJ Green
R Opgen-Rhein
S Bandyopadhyay
S Okawa
S Sathasivam
S Vukosavic
T Thorne
T Wang
Z Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/11/2016
Field of study

Differential networks allow us to better understand the changes in cellular processes that are exhibited in conditions of interest, identifying variations in gene regulation or protein interaction between, for example, cases and controls, or in response to external stimuli. Here we present a novel methodology for the inference of differential gene regulatory networks from gene expression microarray data. Specifically we apply a Bayesian model selection approach to compare models of conserved and varying network structure, and use Gaussian graphical models to represent the network structures. We apply a variational inference approach to the learning of Gaussian graphical models of gene regulatory networks, that enables us to perform Bayesian model selection that is significantly more computationally efficient than Markov Chain Monte Carlo approaches. Our method is demonstrated to be more robust than independent analysis of data from multiple conditions when applied to synthetic network data, generating fewer false positive predictions of differential edges. We demonstrate the utility of our approach on real world gene expression microarray data by applying it to existing data from amyotrophic lateral sclerosis cases with and without mutations in C9orf72, and controls, where we are able to identify differential network interactions for further investigation

Central Archive at the University of Reading

Crossref

PubMed Central

Spiral - Imperial College Digital Repository

Surrey Research Insight

A close examination of double filtering with fold change and t test in microarray analysis

Author: AE Gelfand
G Casella
I Hedenfalk
I Lonnstedt
J Cao
JD Storey
JD Storey
Jing Cao
M Sauer
MA Newton
MM Kittleson
N Jain
P Baldi
P Quinn
R Opgen-Rhein
RA Irizarry
SE Choe
Song Zhang
T Han
VG Tusher
X Cui
Y Li
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Many researchers use the double filtering procedure with fold change and <it>t </it>test to identify differentially expressed genes, in the hope that the double filtering will provide extra confidence in the results. Due to its simplicity, the double filtering procedure has been popular with applied researchers despite the development of more sophisticated methods. Results This paper, for the first time to our knowledge, provides theoretical insight on the drawback of the double filtering procedure. We show that fold change assumes all genes to have a common variance while <it>t </it>statistic assumes gene-specific variances. The two statistics are based on contradicting assumptions. Under the assumption that gene variances arise from a mixture of a common variance and gene-specific variances, we develop the theoretically most powerful likelihood ratio test statistic. We further demonstrate that the posterior inference based on a Bayesian mixture model and the widely used significance analysis of microarrays (SAM) statistic are better approximations to the likelihood ratio test than the double filtering procedure. Conclusion We demonstrate through hypothesis testing theory, simulation studies and real data examples, that well constructed shrinkage testing methods, which can be united under the mixture gene variance assumption, can considerably outperform the double filtering procedure.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Constraint-based probabilistic learning of metabolic pathways from tomato volatiles

Author: Anand K. Gavai
Arnaud Bovy
C Meek
E Baldwin
E Yilmaz
E Yilmaz
Fred van Eeuwijk
G Suizdak
Harm Nijveen
I Tsamardinos
J Kopka
Jack A. M. Leunissen
K Morgenthal
M Kalisch
M Zou
MI Jordan
MJ Beal
N Friedman
N Schauer
Peter J. F. Lucas
R Gohlke
R Opgen-Rhein
R Ursem
Remco Ursem
S Moco
W Weckwerth
Y Tikunov
Yury Tikunov
Publication venue: Springer US
Publication date: 01/01/2009
Field of study

Clustering and correlation analysis techniques have become popular tools for the analysis of data produced by metabolomics experiments. The results obtained from these approaches provide an overview of the interactions between objects of interest. Often in these experiments, one is more interested in information about the nature of these relationships, e.g., cause-effect relationships, than in the actual strength of the interactions. Finding such relationships is of crucial importance as most biological processes can only be understood in this way. Bayesian networks allow representation of these cause-effect relationships among variables of interest in terms of whether and how they influence each other given that a third, possibly empty, group of variables is known. This technique also allows the incorporation of prior knowledge as established from the literature or from biologists. The representation as a directed graph of these relationship is highly intuitive and helps to understand these processes. This paper describes how constraint-based Bayesian networks can be applied to metabolomics data and can be used to uncover the important pathways which play a significant role in the ripening of fresh tomatoes. We also show here how this methods of reconstructing pathways is intuitive and performs better than classical techniques. Methods for learning Bayesian network models are powerful tools for the analysis of data of the magnitude as generated by metabolomics experiments. It allows one to model cause-effect relationships and helps in understanding the underlying processes

Crossref

Springer - Publisher Connector

PubMed Central

Radboud Repository

Randomization in Laboratory Procedure Is Key to Obtaining Reproducible Microarray Results

Author: A Brazma
BJ Singer
Christina A. Harrington
Christopher D. Coldren
D Seo
Gary A. Churchill
GK Smyth
Hyuna Yang
JA Hartigan
JD Storey
JE Larkin
JF Waring
JT Leek
KR Shockley
Kristina Vartanian
L Bullinger
P Tamayo
PJ Valk
R Ihaka
R Opgen-Rhein
RA Irizarry
RA Irizarry
Rob Hall
S Dudoit
S Falcon
Thomas Preiss
X Cui
Y Benjamini
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

The quality of gene expression microarray data has improved dramatically since the first arrays were introduced in the late 1990s. However, the reproducibility of data generated at multiple laboratory sites remains a matter of concern, especially for scientists who are attempting to combine and analyze data from public repositories. We have carried out a study in which a common set of RNA samples was assayed five times in four different laboratories using Affymetrix GeneChip arrays. We observed dramatic differences in the results across laboratories and identified batch effects in array processing as one of the primary causes for these differences. When batch processing of samples is confounded with experimental factors of interest it is not possible to separate their effects, and lists of differentially expressed genes may include many artifacts. This study demonstrates the substantial impact of sample processing on microarray analysis results and underscores the need for randomization in the laboratory as a means to avoid confounding of biological factors with procedural effects

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

Directory of Open Access Journals

PubMed Central